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(54) Surveillance recording device and method 

(57) A surveillance recording device using cameras 
(1)(2). Facial images and whole body images of a per- 
son are extracted from images shot by cameras(1 )(2), 
a height is calculated from the whole body images, and 
retrieval information including a facial image (best shot) 
is associated with images in a recording medium(90) 
and recorded into a database(1 7), and utilized as an in- 



dex for retrieval within the recording medium(90) later. 
Facial images are displayed in a list of thumbnails to 
make it easy to retrieve a target person on a thumbnail 
screen. The images are displayed together with a mov- 
ing image of a target person. 
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Description 

[0001] The present invention relates to a surveillance 
recording device and method for surveilling comings 
and goings of people with a camera. 
[0002] In facilities having objects to be protected, for 
example, in a bank, a surveillance camera is set up to 
surveille comings and goings of people. As one such 
conventional surveillance recording device which can 
record images over a long period of time, a time lapse 
video exists. A time lapse video is a device for com- 
pressing images obtained from a camera and storing the 
images onto a VHS video tape over a long period of time. 
[0003] In this device, in order to reduce the amount of 
data to be recorded, images inputted from a camera are 
recorded at fixed frame intervals while being skipped, 
and this lowers the image quality. Therefore, recording 
onto a videotape for a relatively long period of time is 
possible although the tape length is the same. 
[0004] Furthermore, a method in which images are 
compressed and recorded over a long period of time by 
using an image compression technique such as an 
MPEG exists. 

[0005] However, in a time lapse video, it becomes dif- 
ficult to view recorded images since the level of image 
quality is lowered, and there is a possibility that an en- 
tering person's identity cannot be distinguished. Fur- 
thermore, skipping recording is carried out, so that if a 
key scene occurs during the skipped period, the scene 
including the person who has entered may not be re- 
corded at all. 

[0006] In addition to the abovementioned problems, 
the conventional surveillance recorder simply captures 
camera images. Therefore, after finishing recording, it 
is very difficult for an operator to search a target scene 
or person from long and massive image records. 
[0007] For example, when a number of visitors suc- 
cessively enter, in order to search a target person from 
long and massive image records, an operator searches 
for him/her while viewing all recorded images, and this 
work is troublesome. 

[0008] Therefore, an object of the invention is to pro- 
vide a surveillance recording device and related tech- 
niques by which it becomes possible for an operator to 
easily search a target person. 

[0009] A surveillance recording device according to a 
first aspect of the invention comprises cameras for 
shooting a target space, an image recording and repro- 
ducing unit for recording images shot by the cameras 
onto a recording medium and reproducing images from 
the recording medium, an essential image extracting 
unit for extracting essential images of a person from im- 
ages shot by the cameras, and a retrieval information 
recording unit for recording retrieval information includ- 
ing the essential images. 

[0010] By this construction, an operator can easily 
search for a target image by utilizing retrieval informa- 
tion. 



[0011] In a surveillance recording device according to 
a second aspect of the invention , facial images of people 
are included in the essential images. 
[0012] By this construction, the operator can easily in- 
5 tuitively carry out retrieval while referring to facial imag- 
es of people. 

[0013] In a surveillance recording device according to 
a third aspect of the invention, whole body images of 
people are included in the essential images. 
w [0014] By this construction, an operator can easily 
carry out retrieval based on physical characteristics or 
clothing while referring to the whole body images of peo- 
ple. 

[0015] A surveillance recording device according to a 
15 fourth aspect of the invention comprises a person char- 
acteristics detecting unit for detecting the person char- 
acteristics based on the essential images extracted by 
the essential image extracting unit, and the retrieval in- 
formation includes the person characteristics. 
20 [0016] By this construction, an operator can easily 
carry out retrieval based on person characteristics of 
people. 

[0017] In a surveillance recording device according to 
a fifth aspect of the invention, person characteristics in- 

25 elude the heights of people. 

[0018] By this construction, an operator can easily 
carry out retrieval based on the height of a person. 
[0019] A surveillance recording device according to a 
sixth aspect of the invention comprises a best shot se- 

30 lecting unit for selecting a best shot among facial images 
of people, and the retrieval information includes the best 
shot facial image. 

[0020] By this construction, retrieval can be carried 
out by using clear facial images. 

35 [0021 ] A surveillance recording device according to a 
seventh aspect of the invention comprises a display unit 
and a display image generating unit for generating im- 
ages to be displayed on the display unit, wherein the 
display image generating unit generates a thumbnail 

40 screen for displaying a list of essential images of people. 
[0022] By this construction, an operator can easily 
narrow down a target person on the thumbnail screen. 
[0023] In a surveillance recording device according to 
an eighth aspect of the invention, the display image gen- 

45 erating unit generates a detailed information screen re- 
lating to a specific thumbnail specified on the thumbnail 
screen, and this detailed information screen includes 
essential images of a person, the person characteris- 
tics, and the person shooting time. 

so [0024] By this construction, an operator can narrow 
down a target person on the facial image thumbnail 
screen and review detailed information relating to the 
person, whereby the operator can efficiently carry out 
retrieval. 

55 [0025] In a surveillance recording device according to 
a ninth aspect of the invention, the image recording and 
reproducing unit records images in only sections in 
which the essential image extracting unit has been able 
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to extract essential images into a recording medium. 
[0026] By this construction, useless images including 
no person but backgrounds are not recorded, so that a 
recording medium can be efficiently used. 
[0027] A surveillance recording device according to a 
tenth aspect of the invention comprises at least cameras 
for stereoscopically shooting a target space, an image 
recording and reproducing unit for recording images 
shot by the cameras into a recording medium and re- 
producing images from this recording medium, a detec- 
tion wall setting unit for setting a detection wall for de- 
tection of entry of people into the target space, and a 
collision detecting unit for detecting whether or not peo- 
ple collides with the detection wall, wherein the detec- 
tion wall is a virtual wall composed of a plurality of voxels 
depending on the positional relationship with the cam- 
eras, and the thickness of this detection wall is set to be 
sufficiently small with respect to the depth of the target 
space. 

[0028] By this construction, only important sections 
are surveilled, and the calculation amount is reduced, 
whereby an increase in speed and a saving of system 
resources can be achieved at the same time. Further- 
more, entry of people can be detected by only the cam- 
eras that have been installed in the surveillance record- 
ing device in advance, so that additional equipment 
such as a special sensor is not necessary. 
[0029] In a surveillance recording device according to 
an eleventh aspect of the invention, the essential image 
extracting unit extracts essential images of a person af- 
ter the collision detecting unit detects collision of a per- 
son, and the retrieval information includes the time at 
which the collision detecting means detects collision of 
the person. 

[0030] By this construction, useless extraction proc- 
ess until the person collides with the detection wall is 
eliminated, an operator can easily retrieve images, us- 
ing the time of detection as a key. 
[0031] In a surveillance recording device according to 
a twelfth aspect of the invention, the image recording 
and reproducing unit starts recording images shot by the 
cameras after the collision detecting unit detects colli- 
sion of a person. 

[0032] By this construction, useless image recording 
until a person collides with the detection wall is eliminat- 
ed, the capacity of a recording medium can be efficiently 
used, and the time for seeking within the recording me- 
dium during retrieval can be reduced. 
[0033] The above, and other objects, features and ad- 
vantages of the present invention will become apparent 
from the following description read in conjunction with 
the accompanying drawings, in which like reference nu- 
merals designate the same elements. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0034] 

5 Fig. 1 is a block diagram of a surveillance recording 

device according to an embodiment of the inven- 
tion; 

Fig. 2 is an explanatory view of a detection wall of 
the same; 

10 Fig. 3(a) is an illustration of an image shot by cam- 
eras of the same, Fig. 3(b) is an illustration of the 
facial image of the same, and Fig. 3(c) is an illus- 
tration of the whole body image of the same; 
Fig. 4 is an illustration of a template for facial direc- 
ts tion judgment; 

Fig. 5 is a flowchart of the same surveillance record- 
ing device; 

Fig. 6 is a status transition drawing of a display 
screen of the same; and 

Fig. 7(a) is an illustration of a retrieval screen of the 
same, Fig. 7(b) is an illustration of a thumbnail 
screen of the same, and Fig. 7(c) is an illustration 
of a detailed information screen of the same. 

[0035] Hereinafter, an embodiment of the invention is 
described in detail with reference to the accompanying 
drawings. Fig. 1 is a block diagram of a surveillance re- 
cording device according to an embodiment of the in- 
vention. 

[0036] As shown in Fig. 1 , this device comprises two 
cameras, that is, a first camera 1 and a second camera 
2. Herein, in the present embodiment, if stereo vision is 
possible by the cameras, the number of cameras may 
be three or more, or only one stereo camera may be 
used. As these cameras 1 and 2 : cameras whose instal- 
lation positions and parameters have been generally 
known are used. The positional relationship of the cam- 
eras 1 and 2 is described in detail later. 
[0037] A control unit 4 controls the respective compo- 
nents shown in Fig. 1 , and camera images shot by the 
first camera 1 and second camera 2 are inputted into 
the control unit 4 via an interface 3. 
[0038] A timer 5 supplies information including the 
current date and time to the control unit 4. An input unit 
6 comprises a keyboard and a mouse, and is used by 
an operator to input information such as detection wall 
information described later, recording start/end informa- 
tion, and retrieval information into the device. 
[0039] A display unit 7 is formed of an LCD or CRT, 
which displays images required by an operator. The im- 
ages to be displayed on the display unit 7 are generated 
by a display image generating unit 8 in procedures de- 
scribed later. 

[0040] An image recording and reproducing unit 9 
reads and writes into a recording medium 90, and stores 
and reproduces moving images. Typically, the recording 
medium 90 is a large capacity digital recording and re- 
producing medium such as a DVD or DVC, and the im- 
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age recording and reproducing unit 9 is a player for driv- 
ing this medium. Considering the operation time such 
as index search or fast-forwarding in reproduction of a 
recording medium, use of such a large capacity digital 
image recording and reproducing medium is advanta- 
geous, however, if the operation time is not regarded as 
important, an analog medium such as a VHS may be 
used. The recording format is optional, however, a for- 
mat such as the MPEG in which images are compressed 
is desirable for recording over a long period of time with- 
out a lowering in the apparent image quality. 
[0041] A storing unit 10 is a memory or a hard disk, 
which is read and written by the control unit 4, and stores 
information including detection wall information, facial 
images, whole body images, person characteristics, 
and start/end times. 

[0042] A detection wall setting unit 1 1 sets a detection 
wall described later. A collision detecting unit detects 
whether or not a person collides with the detection wall. 
[0043] An essential image extracting unit 13 extracts 
essential images showing person characteristics from 
images shot by the cameras 1 and 2. In this embodi- 
ment, essential images are facial images and whole 
body images. 

[0044] A person characteristics detecting unit 14 de- 
tects person characteristics. In this embodiment, the 
heights of people are calculated based on the whole 
body images and are used as person characteristics. 
The weights of people may be estimated from the 
heights. Such characteristics may include gender, age 
bracket, body type, skin or hair color, and eye color. 
[0045] A best shot selecting unit 1 5 selects best shots 
that most clearly show person characteristics among es- 
sential images extracted by the essential image extract- 
ing unit 13. In this embodiment, in a case where there 
are several facial images, the best shot selecting unit 
15 chooses an image showing a full face, and deter- 
mines this image as a best shot. As such a best shot, a 
facial image may be optionally selected if facial charac- 
teristics can be easily recognized in the image. 
[0046] In information stored in the storing unit 10, in- 
formation such as facial images (best shots), whole 
body images, person characteristics, and start/end time 
that can be utilized as indexes for retrieval of moving 
images within the storing unit 10 later is recorded in a 
database 17 as moving image retrieval information. A 
database engine 16 retrieves the database 17 or regis- 
ters information into the database 17 under control of 
the control unit 4. 

[0047] Herein, in the present embodiment, the data- 
base 17 corresponds to the "retrieval information re- 
cording means" in claims hereof. Moving image retrieval 
information may be directly recorded into a recording 
medium 90 without especially providing databases or 
database engines if the format of the recording medium 
90 allows for such. 

[0048] In this case, the "recording medium" and "re- 
trieval information recording means" in claims hereof 



are integrated, and this construction is also included in 
the present invention. 

[0049] As above, two constructions, that is, a con- 
struction in which a "recording medium" and a "retrieval 

5 information recording means" in claims hereof are inte- 
grated together and a construction in which a "recording 
medium" and "retrieval information recording means" in 
claims hereof are separated from each other are de- 
scribed. Whichever construction is employed, typically, 

10 by using the format of MPEG7, retrieval information, 
moving images and still images shot by the cameras 1 
and 2 or other necessary data (hereinafter, referred to 
as "quoting data") may be quoted in metadata (the for- 
mat may be binary or ASCII). 

15 [0050] In this case, metadata and quoting data may 
be in the same recording medium or not. For example, 
quoting data may be quoted via a network from a re- 
cording medium including existence of metadata. 
[0051] Herein, by using the format of MPEG7, in 

20 metadata, descriptors are used to categorize quoting 
data, and while classifying retrieval information, pieces 
of data having a mutual relationship can be collectively 
and smartly quoted. This construction is also included 
in the present invention. 

25 [0052] Next, the detection wall in the present embod- 
iment is described with reference to Fig. 2. An entry is 
provided in one wall surface 21 of a target space 20, 
and this entry is provided with doors 22 and 23 in a man- 
ner enabling them to open and close. The surveillance 

30 recording device of this embodiment surveilles move- 
ments of people and objects entering the inside of the 
target space 20 (toward the arrow N). 
[0053] Therefore, the first camera 1 and second cam- 
era 2 are installed toward the wall surface 21 side, and 

35 the positional relationship and parameters of the cam- 
eras are generally known. 

[0054] In this construction, in the present embodi- 
ment, a detection wall 24 is provided slightly ahead from 
the doors 22 and 23. This detection wall 24 is a virtual 

40 thin wall, and formed in parallel with the wall surface 21 
in this example, The inside of the detection wall 24 is 
filled with a number of voxels 25. Preferably, the detec- 
tion wall 24 is formed to be as thin as possible in order 
to reduce the amount of detection processing. For ex- 

45 ample, as shown in the figure, the thickness is set to be 
equivalent to that of one voxel. The thickness of the de- 
tection wall 24 may be set to be equivalent to that of two 
or more voxels, however, at least, the thickness is set 
to be sufficiently small with respect to the depth of the 

so target space 20 (the length in the arrow N direction). 
[0055] As mentioned above, in the present embodi- 
ment, two cameras 1 and 2 are set so as to have points 
of view that are different from each other, and these 
cameras 1 and 2 shoot the wall surface 21 side from 

55 different directions. 

[0056] Herein, when a person enters the inside of the 
target space 20 from the entry in the wall surface 21 , a 
silhouette of the person is shot on the image planes of 
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the respective cameras 1 and 2. Then, when the person 
advances inside the target space 20 and collides with 
the detection wall 24, this collision can be detected by 
the following procedures. The detection wall 24 is a vir- 
tual wall. Therefore, even when the person collides with 5 
the detection walk he/she is not obstructed from advanc- 
ing at all, does not recognize the collision, and can pass 
through the detection wall 24. 

[0057] Herein, it can be judged whether or not the vox- 
els composing the detection wall 24 overlap the person 10 
by the following principle. 

[0058] Voxels that do not overlap the person are out- 
side the person image in the camera image of at least 
one of the cameras 1 and 2. Voxels that overlap the per- 
son are inside the person images in camera images of 15 
all cameras. 

[0059] In other words, if a certain voxel is within per- 
son images in camera images of all cameras, this 
means that the voxel overlaps the person. On the con- 
trary, if a voxel is outside a person image in a camera 20 
image of either one of the cameras, this means that the 
voxel does not overlap the person. 
[0060] Therefore, among the voxels 25 composing 
the detection wall 24, if the number of voxels that are 
within person images in images of all cameras 1 and 2 25 
is one ormore, thecollision detecting unit 12 judges that 
the person has collided with the detection wall 24. 
[0061 ] On the contrary, among the voxels 25 compos- 
ing the detection wall 24, if there is no voxel that is within 
person images in images of all cameras 1 and 2, it is 30 
judged that the person has not collided with the detec- 
tion wall 24. 

[0062] Thus, by means of the thin detection wall 24 
composed of voxels, the fact that a person has entered 
the target space 20 can be detected. Furthermore, as 35 
mentioned above, by forming the detection wall 24 as 
thin as possible, the number of voxels to be examined 
by the collision detecting unit 12 can be reduced, and 
as a result, the operation amount can be reduced and 
high-speed processing can be realized, and the burden 40 
on system resources can also be reduced. 
[0063] Moreover, entry of a person can be detected 
by only cameras for shooting surveillance images (in- 
stalled in advance), and provision of other components, 
for example, an infrared-ray sensor for sensing passage *5 
of people in addition to the cameras is not necessary. 
[0064] Although Fig. 2 shows a flat plane detection 
wall 24, however, the detection wall 24 may be formed 
into an optional shape only if the wall is composed of 
voxels. The detection wall can be freely changed into, 50 
for example, a curved shape, a shape with a bent por- 
tion, steps, or a shape enclosed by two ormore surfaces 
in accordance with a target to be captured by surveil- 
lance. 

[0065] Incidentally, enclosure of a target to be cap- 55 
tured by such a free encircling net is very difficult when 
using the abovementioned infrared ray sensor. 
[0066] Next, referring to Fig. 3, the essential image 
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extracting unit 13 is explained in detail. As mentioned 
above, in the present embodiment, the essential image 
extracting unit 13 extracts essential images (facial im- 
ages and whole body images) showing person charac- 
teristics among images shot by the cameras 1 and 2. 
[0067] A shot image is, for example, as shown in Fig. 
3(a). In Fig. 3(a), doors 22 and 23 are taken in the back- 
ground, and in front of the doors, an image of a woman 
is taken at the left position of the image. 
[0068] Herein, in the present embodiment, the essen- 
tial image extracting unit 13 uses two templates, that is, 
as shown in Fig. 3(a), a first template T1 (for face de- 
tection) of a small ellipse that is long horizontally, and a 
second template T2 (for detection of portions other than 
the face) of a large ellipse that is long vertically. Then, 
the essential image extracting unit 13 carries template 
matching in the usual manner to calculate correlation 
between the shot image and these templates T1 and T2 , 
and calculates the point with maximum correlation in the 
shot image. 

[0069] As a result, as shown in Fig. 3(a), when suffi- 
cient matching is obtained (comparison with threshold 
values may be properly made), as shown in Fig. 3(b), 
the essential image extracting unit 13 extracts images 
in the vicinity of the template T1 as facial images. Fur- 
thermore, as shown in Fig. 3(c), images in the vicinity of 
both templates T1 and T2 are extracted as whole body 
images. 

[0070] As essential images, only facial images are 
sufficient in practical use. The method for extracting fac- 
es from the shot image is not limited to the abovemen- 
tioned method. Other than this, for example, a method 
involving detection of face parts and a method involving 
extraction of skin-color regions can be optionally select- 
ed. 

[0071] As shown in Fig. 3(c), the person characteris- 
tics detecting unit 14 determines the height H of the 
whole body images extracted by the essential image ex- 
tracting unit 13 as the height of a shot person as shown 
in Fig. 3(c). This height H can be easily determined from 
the number of pixels of the whole body images since the 
geometric positions of the cameras 1 and 2 are known. 
[0072] Next, best shot selection by the best shot se- 
lecting unit 15 is explained with reference to Fig. 4. 
[0073] As mentioned above, in the present embodi- 
ment, when there are several facial images, the best 
shot selecting unit 15 chooses a full facial image and 
determines it as a best shot. This is because person face 
characteristics become most clear when the person 
turns his/her face frontward. 

[0074] As described later, from a collision of a person 
with the detection wall 24 till the end of shoot of the per- 
son, a certain period of time elapses normally. There- 
fore, during the period, images of several frames are 
shot and it is possible that several facial images of the 
person are obtained. The best shot selecting unit 1 5 se- 
lects an image in which the person is most clearly shot 
among these images. 
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[0075] In the present embodiment, judgment of face 
direction is made. Concretely, the best shot selecting 
unit 15 has a template of a standard full face as shown 
in Fig. 4, and carries out matching between the facial 
images extracted by the essential image extracting unit 
13 and this template. Then, a facial image that is best 
matched with the template is regarded as a best shot. 
[0076] As another judgment of face direction, it is also 
allowed that the best shot selecting unit 15 determines 
an image with a maximum number of pixels within the 
skin color regions in a color space as a best shot. 
[0077] Or, in place of the face direction judgment, a 
best shot can be determined by judging the timing. Here- 
in: since the speed of walking of a person can be ordi- 
narily known, the time until the person is most clearly 
shot by the cameras 1 and 2 after a person collides with 
the detection wall 24 can be roughly estimated. There- 
fore, the best shot selecting unit 15 may determine a 
best shot based on this time. 

[0078] Next, the shooting and recording flow by the 
surveillance recording device according to the present 
embodiment is explained with reference to Fig. 5. 
[0079] First, in step 1 , the control unit 4 clears the stor- 
ing unit 10, and the detection wall setting unit 11 sets 
the detection wall 24 (step 2). Herein, the control unit 4 
requires an operator to input detection wall information 
from the input unit 6, or if the information has already 
been known, the information may be loaded from an ex- 
ternal storing unit. 

[0080] Next, in step 3, the control unit 4 starts input- 
ting images from the first camera 1 and second camera 
2. Then, the control unit 4 directs the collision detecting 
unit 12 to detect whether or not a person has collided 
with the detection wall 24, and the collision detecting 
unit 12 feeds back detection results to the control unit 4. 
[0081 ] If collision is not detected, the control unit 4 ad- 
vances the process to step 1 6, and confirms that there 
are no instructions to end recording inputted from the 
input unit 6, and then returns the process to step 3. 
[0082] When collision is detected, in step 5, the con- 
trol unit 4 obtains current date information from the timer 
5, and stores this date information as a start time into 
the storing unit 10. 

[0083] Next, in step 6, the control unit 4 transmits a 
shot image to the essential image extracting unit 1 3 and 
commands the unit to extract essential images. Receiv- 
ing this command, the essential image extracting unit 
1 3 attempts to extract facial images and whole body im- 
ages from the shot image. 

[0084] At this point, when extraction is successfully 
carried out (step 7), the essential image extracting unit 
13 adds facial images and whole body images into the 
storing unit 10 (step 8), and notifies the control unit 4 of 
successful completion of extraction. Receiving this no- 
tification, the control unit 4 instructs the image recording 
and reproducing unit 9 to record the shot image as mov- 
ing images. As a result, moving images are stored in the 
recording medium 90. 



[0085] On the other hand, in step 7, when the extrac- 
tion is failed (for example, when a person has come out 
from the fields of view of the cameras), the control unit 
4 checks whether or not the essential images have been 

5 stored in the storing unit 10 in step 10. 

[0086] When these have been stored, the control unit 
4 judges that shooting of a person has been completed, 
and executes the next processing. First, in step 11 , cur- 
rent date information is obtained from the timer 5, and 

10 stores this date information into the storing unit 10 as 
an end time. 

[0087] In step 12, the control unit 4 transmits the 
whole body images in the storing unit 10 to the person 
characteristics detecting unit 14, directs the unit to cal- 
15 culate height as a person characteristic, and stores the 
calculation resutt into the storing unit 10. Furthermore, 
in step 1 3, the control unit 4 directs the best shot select- 
ing unit 15 to select a best shot and obtains a selection 
result. 

20 [0088] When the abovementioned processing is end- 
ed, the control unit 4 registers useful information includ- 
ing a best shot facial image, whole body image, start 
time, end time, and person characteristics (moving im- 
age retrieval information) for retrieval of moving images 

25 in the database 1 7 by using the database engine 1 6 in 
step 14. 

[0089] After completing registration, in step 15, the 
control unit 4 clears the moving image retrieval informa- 
tion in information stored in the storing unit 1 0, advances 
30 the process to step 16, and prepares for the next 
processing. 

[0090] In step 1 0, when there is no essential image in 
the storing unit 10, the control unit 4 judges that not a 
person but some object has collided with the detection 
35 wall 24 and advances the process to step 16, and pre- 
pares for the next processing. 

[0091] In the processes mentioned above, the order 
of steps 11 through 13 may be freely changed. 
[0092] By this construction, it can be understood that 

40 moving images in only a period in which a person is shot 
by the cameras after collision the detection wall is de- 
tected are recorded in the storing unit 10. That is, use- 
less recording in a period in which no person is shot by 
the cameras is omitted, so that efficient operation is pos- 

45 sible. In addition, moving image retrieval information is 
stored in the database 1 7, and by using this information 
as an index, only important scenes can be easily re- 
trieved and reproduced. 

[0093] Next, the retrieval flow of surveillance results 
50 js explained with reference to Fig. 6 and Fig. 7. First, as 
shown in Fig. 6, in this retrieval, the display image gen- 
erating unit 8 generates three types of screens, that is, 
a retrieval screen (Fig. 7(a)), thumbnail screen (Fig. 7 
(b)), and detailed information screen (Fig. 7(c)) in ac- 
55 cordance with the circumstances, and displays them on 
the display unit 7. 

[0094] These screens are changed from each other 
when an operator clicks each button by using the input 
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unit 6 as shown in Fig. 6. 

[0095] First, in the retrieval screen shown in Fig. 7(a), 
the abovementioned moving image retrieval information 
(registered in the database 1 7) is inputted. In the exam- 
ple shown in the figure, a date and height are inputted, 
however, this is just one example, and the input infor- 
mation may be properly changed. 
[0096] Then, when the moving image retrieval infor- 
mation is inputted and the retrieval start button is 
clicked, the control unit 4 directs the database engine 
16 to retrieve a corresponding piece of moving image 
retrieval information, and the retrieval results are trans- 
mitted to the display image generating unit 8. 
[0097] Then, the display image generating unit 8 pre- 
pares thumbnails from corresponding facial images 
(best shots) and displays a list of thumbnails as shown 
in Fig. 7(b). 

[0098] Furthermore, in a case where there are many 
person candidates and it is not possible to display all 
thumbnails at the same time, a next screen button and 
a previous screen button are displayed on the screen. 
Then, when the button is clicked, the remaining thumb- 
nail images are listed and displayed. 
[0099] An operator checks this list and searches data 
to be examined based on facial images and clicks the 
thumbnail which he/she wants to check. 
[0100] Then, the control unit 4 informs the display im- 
age generating unit 8 of the desired thumbnail based on 
the information inputted through the input unit 6. Receiv- 
ing this information, the display image generating unit 8 
displays a detailed information screen as shown in Fig. 
7(c). In this example, at this point, from moving image 
retrieval information corresponding to the desired 
thumbnail, a facial image (best shot), whole body image, 
start time, and person characteristics (height) are re- 
trieved and displayed. In addition : a corresponding mov- 
ing image of the recording medium 90 from this start 
time is displayed at the same time. 
[0101] The display patterns shown in the figures are 
illustrations, and they may be properly changed for easy 
observation. 

[0102] Herein, in this example, retrieval is carried out 
in the retrieval screen first, however, in a case where the 
amount of data registered in the database 17 is small, 
it is also allowed that the retrieval process is omitted, 
and whole data is displayed on the thumbnail screen 
and selection of a target person is made. 
[0103] Furthermore, the thumbnail images and shoot- 
ing time are displayed together on the thumbnail screen, 
however, in addition to the shooting time, other person 
characteristics that can be registered into the database 
17, for example, gender, age and the like may be dis- 
played. 

[0104] As shown in the figure, when a moving image 
is simultaneously displayed, incidental circumstances 
such as the number of persons and characters who en- 
tered at the same time with a target person can be 
grasped, and for example, it becomes easy to investi- 
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Claims 

5 

1. A surveillance recording device comprising: 

cameras(1)(2) for shooting a target space; 
an image recording and reproducing means(8) 

10 for recording images shot by the cameras(1 )(2) 

into a recording medium(90), and reproducing 
images from this recording medium(90); 
an essential image extracting means(13) for 
extracting essential images of a person from 

*5 the images shot by the cameras(1 )(2); and 

a retrieval information recording means(1 7) for 
recording retrieval information including the es- 
sential images. 

20 2. The surveillance recording device according to 
Claim 1 , wherein the essential images include facial 
images of the person. 

3. The surveillance recording device according to 
25 Claim 1 , wherein the essential images include 

whole body images of the person. 

4. The surveillance recording device according to 
Claim 1 , further comprising a person characteristics 

30 detecting means(14) for detecting person charac- 
teristics based on the essential images extracted by 
the essential image extracting means(13), wherein 
the retrieval information includes the person char- 
acteristics. 

35 

5. The surveillance recording device according to 
Claim 4, wherein the person characteristics include 
the height of the person. 

40 6. The surveillance recording device according to 
Claim 2, further comprising a best shot selecting 
means(1 5) for selecting a best shot among the per- 
son facial images, wherein 

the retrieval information includes the best shot facial 
45 image. 

7. The surveillance recording device according to 
Claim 2, further comprising a display means(7) and 
a display image generating means(8) for generating 
50 display images to be displayed by the display 

means(7), wherein the display image generating 
means(8) generates a thumbnail screen for display- 
ing a list of essential images of people. 

55 8. The surveillance recording device according to 
Claim 7, wherein the display image generating 
means(8) generates a detailed information screen 
relating to a specified thumbnail instructed on the 
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thumbnail screen, and this detailed information 
screen includes essential images, characteristics, 
and shooting times of people. 

9. The surveillance recording device according to 
Claim 1 , wherein the image recording and repro- 
ducing means(8) records images only in sections in 
which the essential image extracting means(13) 
can extract essential images of people onto the re- 
cording medium(90). 

10. A surveillance recording device comprising: 

cameras(1)(2) for shooting a target space; 
an image recording and reproducing means(8) 
for recording images shot by the cameras(1)(2) 
onto a recording medium(90) and reproducing 
images from this recording medium(90); 
a detection wall setting means(11) for setting a 
detection wall(24) for detecting entry of people 
into a target space; and 

a collision detecting means(12) for detecting 
whether or not a person has collided with the 
detection wall(24), wherein 
the detection wall (24) is a virtual wall com- 
posed of a plurality of voxels(25) depending on 
the positional relationship of the cameras(1 )(2), 
and the thickness of this detection wall(24) is 
set to be sufficiently small with respect to the 
depth of the target space. 

11. The surveillance recording device according to 
Claim 10, wherein the essential image extracting 
means(1 3) extracts essential images of a person af- 
ter the collision detecting means(12) detects colli- 
sion of a person, and the retrieval information in- 
cludes the time at which the collision detecting 
means(12) detects collision of the person. 

12. The surveillance recording device according to 
Claim 10, wherein the image recording and repro- 
ducing means(8) starts recording images shot by 
the cameras(1)(2) after the collision detecting 
means(12) detects collision of a person. 

13. A surveillance recording device comprising: 

cameras{1)(2) for shooting a target space; 
an image recording and reproducing means(8) 
for recording images shot by the cameras(1 )(2) 
onto a recording medium(90) and reproducing 
images from this recording medium(90); 
an essential image extracting means(13) for 
extracting essential images of an object from 
images shot by the cameras(1)(2); and 
a retrieval information recording means(17) for 
recording retrieval information including the es- 
sential images. 



14. A surveillance recording method in which a target 
space is shot by cameras(1 )(2) and the shot images 
are recorded onto a recording medium(90) ; wherein 
essential images of an object are extracted from the 
5 images shot by the cameras(1)(2), and retrieval in- 
formation including these essential images is asso- 
ciated with the images shot by the cameras(1)(2) 
and recorded. 

10 15. A surveillance recording method in which a target 
space is shot by cameras(1 )(2) and the shot images 
are recorded onto a recording medium(90), wherein 
essential images of a person are extracted from the 
images shot by the cameras(1 )(2), and retrieval in- 

15 formation including the essential images is associ- 
ated with the images shot by the cameras(1 )(2) and 
recorded. 

16. The surveillance recording method according to 
20 Claim 1 5 : wherein the essential images include fa- 
cial images of the person. 

17. The surveillance recording method according to 
Claim 15, wherein the essential images include 

25 whole body images of the person. 



18. The surveillance recording method according to 
Claim 15, wherein person characteristics are de- 
tected based on essential images and the person 
characteristics are included in the retrieval informa- 
tion. 

19. The surveillance recording method according to 
Claim 18, wherein the person characteristics in- 
clude the height of the person. 

20. The surveillance recording method according to 
Claim 16, wherein a best shot is selected among 
the facial images of the person and the best shot 
facial image is included in the retrieval information. 

21. The surveillance recording method according to 
Claim 16, wherein a thumbnail screen for displaying 
a list of essential images of people is displayed. 



30 
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40 



45 



50 



22. The surveillance recording method according to 
Claim 21 , wherein a detailed information screen in- 
cluding essential images of a person, person char- 
acteristics, and person shooting times that relate to 
a specified thumbnail instructed on the thumbnail 
screen is displayed. 



23. The surveillance recording method according to 
Claim 15, wherein images are recorded onto a re- 

55 cording medium(90) only in sections in which es- 
sential images of people have been extracted. 

24. A surveillance recording method in which a target 
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space is at least stereoscopically shot by cameras 
(1)(2) and the shot images are recorded onto a re- 
cording medium(90) s wherein 
a virtual detection wall(24) is provided which is com- 
posed of a plurality of voxels(25) depending on the 5 
positional relationship of the cameras(1)(2) and has 
a thickness that is sufficiently small with reference 
to the depth of the target space; and 
entry of a person into the target space is detected 
by detecting whether or not the person has collided 10 
with the detection wall(24). 

25. The surveillance recording device according to 
Claim 24, wherein extraction of essential images of 

a person is started after detecting that the person is 
has collided with the detection wall(24) ) and the 
time at which the person collides with the detection 
wall(24) is included in the retrieval information. 

26. The surveillance recording method according to 20 
Claim 24, wherein recording of images shot by the 
cameras(1)(2) is started after detecting that a per- 
son has collided with the detection wall(24). 
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